-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distinct key inner join #14990
Add distinct key inner join #14990
Conversation
🔥🔥🔥 |
@jlowe The prototype should be ready for testing. Please let me know if you find any bugs/issues. I'm testing/tuning several alternative algorithms locally for the probe kernel so no need to worry about the performance for now.
|
Co-authored-by: Lawrence Mitchell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good from my end wrt the one racey read/write
@@ -21,6 +21,9 @@ | |||
|
|||
namespace cudf::detail { | |||
|
|||
/// Default load factor for cuco data structures | |||
static double constexpr CUCO_DESIRED_LOAD_FACTOR = 0.5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
/** | ||
* @brief An comparator adapter wrapping both self comparator and two table comparator | ||
*/ | ||
template <typename Equal> | ||
struct comparator_adapter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound extremely familiar to me 😛 . Can you extract the code into some common header (or row_operators.cuh
) instead of putting here? We would probably reuse it somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such adapters look similar to each other while the contents are quite different, e.g. the insert comparator always returns false since we know all insert elements are distinct, the set key type is a pair of row hash value and row index which is rarely seen besides join operations, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Nghia Truong <[email protected]>
/merge |
Adds Java bindings to the distinct hash join functionality added in #14990. Authors: - Jason Lowe (https://github.com/jlowe) Approvers: - Jim Brennan (https://github.com/jbrennan333) - Nghia Truong (https://github.com/ttnghia)
Closes #15156 Fixes the invalid global read introduced by #14990 and simplifies the logic. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #15164
Description
Contributes to #14948
This PR adds a public
cudf::distinct_hash_join
class that provides a fast code path for joins with distinct keys.Only distinct inner join is tackled in the current PR.
Checklist